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© Pipeline data processing apparatus having small power consumption. 

© In a pipeline processing apparatus including a plurality of serially connected stages (STi, ST 2 ,--«). a 
plurality of clock signals (CLK1, CLK2,» • •) are supplied to the stages individually. The clock signals can be 
individually stopped. 
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BACKGROUND OF THE INVENTION 
Held of the Invention 

5 The present invention relates to a pipeline processing apparatus including a plurality of serially- 

connected stages, each stage having a plurality of flip-flops and a logic gate combination circuit. 

Description of the Related Art 

10 A microprocessor such as a pipeline processing apparatus includes a plurality of stages each having a 
plurality of flip-flops and a logic gate combination circuit. About half of the entire power consumption is 
dissipated in the flip-flops. Also, about half of that half is dissipated in a clock driving circuit for driving a 
clock signal supplied to the flip-flops, and the remainder of that half is dissipated in the flip-flops per se and 
their outputs. 

15 Generally, in a complementary metal oxide semiconductor (CMOS) large scale integrated circuit (LSI), 
power consumption is mainly dependent upon dynamic power consumption caused by charging and 
discharging operations performed upon a load capacity, and can be represented by (see Neil Weste et al, 
"PRJNCIPLES OF CMOS VLSI DESIGN", pp. 144-149, 1985) 

20 P = C L V DD % (1) 

where 

P is a power consumption; 
C L is a load capacity; 

25 V D d is a power supply voltage; and 

f p is the frequency of a signal. If the signal is a clock signal whose frequency is f c , then f p = f c . If the signal 
is an output signal of a flip-flop, then f p 1/4 f c in view of the probability of transition of the output signal 
from high to low and vice versa. 

In the pipeline processing apparatus, however, the output signals of the fiip-fiops are not always 

30 changed from high to low or vice versa in accordance with the clock signal. Each output signal of the flip- 
flops may be changed once for ten clock signals on the average, and in this case, f p ^ 1/10 f c . This means 
about 90 % of the power consumption dissipated in the clock driver circuit can be wasted. This will be 
explained later in detail. 

35 SUMMARY OF THE INVENTION 

It is an object of the present invention to reduce the power consumption of a pipeline processing 
apparatus including a plurality of serially-connected stages each having at least a plurality of flip-flops. 

According to the present invention, in a pipeline processing apparatus including a plurality of serially 
40 connected stages, a plurality of clock signals are supplied to the stages individually. The clock signals can 
be individually stopped, so that wasteful transitions of the clock signals are reduced. 

BRIEF DESCRIPTION OF THE DRAWINGS 

45 The present invention will be more clearly understood from the description as set forth below, in 
comparison with the prior art, with reference to the accompanying drawings, wherein: 

Fig. 1 is a circuit diagram illustrating a prior art pipeline processing apparatus; 

Figs. 2A through 2F are timing diagrams showing the operation of the circuit of Fig. 1 ; 

Figs. 3A and 3B are circuit diagrams illustrating examples of flip-flops; 
50 Fig. 4 is a circuit diagram illustrating a first embodiment of the pipeline processing apparatus according 

to the present invention; 

Figs. 5A through 5K are timing diagrams showing the operation of the circuit of Fig. 4; 
Fig. 6 is a circuit diagram illustrating a second embodiment of the pipeline processing apparatus 
according to the present invention; 
55 Figs. 7 A through 7K are timing diagrams showing the operation of the circuit of Fig. 6; and 

Fig. 8 is a circuit diagram illustrating a third embodiment of the pipeline processing apparatus according 
to the present invention. 
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DESCRIPTION OF THE PREFERRED EMBODIMENT 

Before the description of the preferred embodiment, a prior art pipeline processing apparatus will be 
explained with reference to Figs. 1 , 2A through 2F, and 3A and 3B. 

5 In Fig. 1, which illustrates a prior art pipeline processing apparatus, a plurality of stages STi , ST 2 , 

are provided. The first stage STi is comprised of flip-flops FFn, FF 12 . ••• for receiving data DA, , DA2, 
-•• , respectively, and a logic gate combination circuit C1 for receiving the output data DB1 , DB 2 , •••of 
the flip-flops FFn, FF12. ••• .In this case, the flip-flops FF1 1 , FF1 2 , • • • are of a D-type which operate in 
response to their input rise edges. Note that the logic gate combination circuit C1 is comprised of logic 

10 gates such as AND circuits, NAND circuits, OR circuits and NOR circuits, but includes no flip-flop and no 
latch circuit. Also, the data DA, , DA2, • • . are supplied to the first stage STi through selectors. SEL1 , SEL2, 
• • • which are controlled by a stall signal STL for stopping the pipeline operation of the pipeline processing 
apparatus. That is, when the stall signal STL is low ( = n 0 w ), the data DAi , DA2 , • • • are supplied to the flip- 
flops FFn, FF 12i •••. so that the contents of the flip-flops FFn, FFi 2 , ... are changed. On the other 

75 hand, when the stall signal STL is high (="1"), the output signals of the flip-flops FF n , FFi 2 , ... are fed 
back to the inputs thereof, so that the contents of the flop-flops FF1 1 , FF 12i . . . are not changed. 

Also, the second stage ST 2 is comprised of flip-flops FF 2 i, FF 22 , .... and a logic gate combination 
circuit C 2 for receiving the output data DC1 , DC^, • • • of the flip-flops FF21 . FF 22 . • ... The third stage ST 3 
and its post-stages have the same configuration as the second stage ST 2 . 
20 Further, a clock signal CLK is supplied to all the flip-flops FFi 1 , FF1 2 , • • • , FF 2i , FF 22 , .... FF 31 , FF 32 
, .... and therefore, the flip-flops FFn, FFi 2 , .... FF 21 . FF 22 , ... , FF 31 , FF 32 , ... are simultaneously 
operated. 

The operation of the pipeline processing apparatus of Fig. 1 will now be explained with reference to 
Figs. 2A through 2F. 

25 As shown in Figs. 2A and 2B. the clock signal CLK and the data DA (DA1 , DA2, •••) are always 
generated. In this state, as shown in Fig. 2C. the stall signal STL is "0" at rise-edge timings to. ti, U and ts 
of the clock signal CLK, so that the selectors SEL1 . SEU, • . . select the data DA. As a result, the output 
data DB (DB,, DB2, •••) of the flip-flops FFn, FF12, ... are data obtained by delaying the data DA by 
one clock time period AT, as shown in Fig. 2D. On the other hand, as shown in Fig. 2C. the stall signal STL 

30 is "CT at rise edge timings tz, h and fe of the clock signal CLK, so that the output data DB of the flip-flops 
FFn FFi 2. ••• are not changed as shown in Fig. 2D. Also, the second stage ST 2 and its post stages 
always receive the clock signal CLK, and therefore, the operation results of the logic gate combination 
circuits Ci, C*. ••• based upon the outputs of their prestage flip-flops are written into the flip-flops of the 
second stage ST 2 and its post stages, as shown in Figs. 2E and 2F. 

35 In the pipeline processing apparatus of Fig. 1, however, even during time periods where the contents of 
the flip-flops are not changed due to the stall signal STL, the flip-flops receive the clock signal CLK so as to 
operate them (see and ts of Fig. 2D, ta and U of Fig. 2E and U and ts of Fig. 2F). This increases the 
power consumption. 

For example, if each of the flip-flops is of a static type as illustrated in Fig. 3A, the power consumption 
40 of the pipeline processing apparatus of Fig. 1 is calculated below. Here, the following conditions are 
assumed: 

an input capacity of the clock signal CLK to each flip-flop = 0.06 pF; 
an internal load capacity of each flip-flop = 0.07 pF; 

an input capacity of the stall signal STL to each of the selectors SEL1 , SEU, • • • = 0.04 pF; 
45 an internal load capacity of each of the selectors SEL1 , SEU, • • • = 0.05 pF; 
the number of the flip-flops FFi 1 , FFi 2 , • • • = 40; 
the number of the flip-flops FF 2 i , FF 2 i , • • • =20; 
the number of the flip-flops FF 3 i , FF 32> • • • =30; 

an internal load capacity of the logic gate combination circuit C1 = 20 pF; 
50 an internal load capacity of the logic gate combination circuit O2 = 10 pF; 
V D0 = 5V; 

the frequency f c of the clock signal CLK = 50 MH Z ; 

the probability of w l tt within the stall signal STL = 2/5, i.e., the frequency of the stall signal STL = 2/5 
• 50 MH Z ; and 

55 the frequency of other logic signals = V4 (due to the fact that the probability of transition of the output 
signal of each flip-flop at unstalled timings where the output signal is expected to change is 1/2, i.e., the 
transtion frequency is 1/4 f s ). 
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Therefore, from the equation (1), 

p = (40 + 20 + 30) • 0.06 • 10" 12 x 5 s x 50 • 10 6 
+ (40 + 20 + 30) • 0.07 • 10~ 12 x S 2 x 1/4 • 2/5 • 50 • 10 s 
5 + 40 • 0.04 • 10" 12 x 5 2 x 2/5 • 50 • 10 6 

+ 40 • 0.05 • 10~ 12 x 5 2 x 1/4 • 2/5 • 50 • 10 G 

+ (20 + 10) • 10" 12 x 5 2 x 1/4 • 2/5 • 50 • 10* (2) 

where the first term is a power consumption dissipated by the clock signal CLK; the second term is a 
10 power consumption dissipated within the flip-flops; the third term is a power consumption dissipated by the 
stall signal STL; the fourth term is a power consumption dissipated in the selectors SELi , SEL2, and 
the fifth term is a power consumption dissipated in the logic gate combination circuits C1 and C2. The 
equation (2) is represented by 

P = (6.75 + 0.79 + 0.80 + 0.25 + 3.75) • 10* * 

= 12.34mW (3) 

20 Thus, 55 percent of the entire power consumption (12.34 mW) is dissipated by the clock signal CLK, 
and 2/5 of this power consumption (22 percent of the entire power consumption) is dissipated when the 
pipeline processing apparatus is stalled. Also. 9 percent (0.80 + 0.25 mW) of the entire power consumption 
is dissipated in the selectors SELi, SEU,-**. In other words, 31 percent of the entire power consumption 
does not contribute to the pipeline operation of the pipeline processing apparatus, and therefore, the 31 

25 percent of the entire power consumption is wasteful. 

On the other hand, if each of the flip-flops is of a dynamic type as illustrated in Fig. 3B, since the 
number of transistors is reduced as compared with the static type flip-flop as illustrated in Fig. 3B, the input 
capacity of the clock signal CLK to each flip-flop is decreased from 0.06 pF to 0.03 pF, and the internal load 
capacity of each flip-flop is decreased trom 0.0/ pF to 0.04 pF. Therefore, the first term of the equation (2) 

30 is replaced by 

(40 + 20 + 30) • 0.03 • 10" 12 x 5* x 50 • 10 s 

and the second term of the equation (2) is replaced by 

35 

(40 + 20 + 30) • 0.04 • 10" 12 x ? x 1/4 • 2/5 x 50 -10 6 

Therefore, in this case, the entire power consumption p is represented by 



P = (3.38+0.45+0.80+0.25+3.75) • 10-3 

= 8.63mW (4) 

45 Thus, 39 percent of the entire power consumption (8.63 mW) is dissipated by the clock signal CLK, and 
2/5 of this power consumption (16 percent of the entire power consumption) is dissipated when the pipeline 
processing apparatus is stalled. Also, 12 percent (0.80 + 0.25 mW) of the entire power consumption is 
dissipated in the selectors SELi, SEU.-**. In other words, 28 percent of the entire power consumption 
does not contribute to the pipeline operation of the pipeline processing apparatus, and therefore, the 28 

50 percent of the entire power consumption is wasteful. 

In Hg. 4, which illustrates a first embodiment of the present invention using static type flip-flops as 
illustrated in Fig. 3A, flip-flops FF10, FF 2 o. • • • and OR circuits G1 , G 2 , G 3 ,» • • are added to the elements 
of Fig. 1, and the selectors SELi, SEL2, ••■of Fig. 1 are deleted. Each of the flip-flops FF10. FF 20 ,-«* 
delays stall signals STL1 ( = STL), STL2, •••by one clock time period AT. The OR circuits Gi, G2, G3, 

55 • •. • turn ON and OFF the clock signal in accordance with the stall signals STL1 , STL2, STL3. • • • . Thus, 
the stall signals STL1 , STL2, STL3, • • • having one clock time period AT therebetween are generated. As a 
result, the flip-flops FFn, FF12. ••• are operated in accordance with a clock signal CLK1 which is an OR 
logic between the clock signal CLK and the stall signal STL1, the flip-flops FF21 , FF 22 , • • • are operated in 
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accordance with a clock signal CLK2 which is an OR logic between the clock signal CLK and the stall signal 
STL2, and the flip-flops FF31, FF 32 ,-" are operated in accordance with a clock signal CLK3 which is an 
OR logic between the clock signal CLK and the stall signal STL3. 

The operation of the pipeline processing apparatus of Fig. 4 will now be explained with reference to 
5 Figs. 5A through 5K. 

As shown in Figs. 5A and 5B f the clock signal CLK and the data DA (DAi, DA2 , • • • ) are also always 
generated. In this state, as shown in Fig. 5C, the stall signal STL1 is "0" at rising-edge timings to, ti, U and 
is of the clock signal CLK and is n 1 n at rising edge timings to. to and to of the clock signal CLK. As a result 
the clock signal CLK1 for the flip-flops FF„, FF, 2 . • rises at only timings to, ti . U and ts, as shown in 
to Fig. 5D. Therefore, the data DB (DB 1f DEfc,...) of the flip-flops FFi,, FF 12( ... are changed at only 
timings to, ti , U and ts, as shown in Fig. 5E, and therefore, the output data DB are obtained by delaying the 
data DA by one clock time period AT. 

Also, as shown in Fig. 5F, the stall signal STL2 is delayed as compared with the stall signal STL1 by 
one clock time period AT. Therefore, as shown in Fig. 5F, the stall signal STL2 is "0" at rising-edge timings 
*s to, t1.t2.t5 and to of the clock signal CLK and is "1" at rising edge timings to and U of the clock signal 
CLK. As a result, the clock signal CLK2 for the flip-flops FF 21 . FF 22 . ... rises at only timings to. t,, ts ts 
and to, as shown in Fig. 5G. Therefore, the data DC (DCi. DCs.---) of the flip-flops FF 21 , FF 22 , ... are 
changed at only timings to. ti, to, ts and ts, as shown in Fig. 5H, and therefore, the output data DC are 
obtained by delaying the data DB by one clock time period AT. 

Also, as shown in Fig. 51, the stall signal STL3 is delayed as compared wilt the stall signal STL2 by one 
clock time period AT. Therefore, as shown in Fig. 51. the stall signal STL3 is "0" at rising-edge timings to, 
ti, to. to and to of the clock signal CLK and is "1" at rising edge timings U and to of the clock signal CLK 
As a result, the clock signal CLK3 for the flip-flops FF31 , FF 32 . • • • rises at only timings to. ti , to. to and to 
as shown in Fig. 5 J. Therefore, the data DD (DDi , DD2,. • •) of the flip-flops FF31 , FF 32 . . . . are changed 
at only timings to. to . to to and to. as shown in Fig. 5K. and therefore, the output data DD are obtained by 
delaying the data DC by one clock time period AT. 

Thus, according to the first embodiment, during a stall period where the change of the flip-flops FF^ 1 . 
FF 12 . .... FF 2 i, FF22. FF 31l FF 32 , ••• is unnecessary, the generation of the clock signals CLK1, 
CLK2, CLK3, • • • is stopped, thus reducing the power consumption. 

An actual power consumption of the pipeline processing apparatus of Fig. 4 will be explained as 
compared with that of the pipeline processing apparatus of Fig. 1. The power consumption of the pipeline 
processing apparatus of Fig. 4 is calculated below. Here, the following conditions are assumed: 
an input capacity of the clock signal CLK to each flip-flop = 0.06 pF; 
an internal load capacity of each flip-flop = 0.07 pF; 
35 an input capacity of each of the OR circuits Gi , G 2 , • • • =0.10 pF; 

an internal load capacity of each of the OR circuits G1 , G 2 , • • • = 0.50 pF; 
the number of the flip-flops FFn , FF12, • • • = 40; 
the number of the flip-flops FF21 , FF 2 i .... =20; 
the number of the flip-flops FF 31 , FF 32 , • • • =30; 
40 an internal load capacity of the logic gate combination circuit C1 = 20 pF; 
an internal load capacity of the logic gate combination circuit C2 = 10 pF- 
V dd = 5V; 

the frequency f c of the clock signal CLK = 50 MH 2 ; 

the probability of "1" within the stall signal STL = 2/5, i.e., the frequency of the stall signal STL = 2/5 
45 • 50 MH Z ; and 

the frequency of other logic signals = f</4. 
Therefore, from the equation (1), 

P = (2 • 0.06 + 3 • 0.10) • 10~ 12 x 5 2 x 50 • 10* 
50 + (3 • 0.50 + (40 + 20 + 30) • 0.06) • 10" 12 x 5 2 x 2/5 • 50 • 10 6 
+ (40 + 20 + 30) • 0.07 • 10- 12 x 5 2 x 1/4 • 2/5 • 50 • 10 6 
+ (2 • 0.07 + 3 • 0.10) • 10"' 2 x 5 2 x 2/5 • 50 • 10 6 
+ (20+10) • 10- 12 x 5 2 x 1/4 • 2/5 • 50 • 10 6 (5) 



20 



25 



30 



55 



where the first term is a power consumption dissipated by the clock signal CLK; the second term is a 
power consumption dissipated in the OR circuits Gi, G 2 and G 3 and by the clock signals CLK1, CLK2 and 
CLK3; the third term is a power consumption dissipated within the flip-flops; the fourth term is a power 
consumption dissipated by the stall signal STL; and the fifth term is a power consumption dissipated in the 
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logic gate combination circuits Ci and C2. The equation (5) is represented by 

P = ((K53+3o45+0*79+Oo25+3o75> • 10-3 
5 = 8o74 mW (6) 

Thus, the entire power consumption (8.74 mW) in the first embodiment, validating all the rise edges of 
the clock signals CLK1, CLK2 and CLM3 can be reduced by 29 percent as compared with that (12.34 mW) 
10 of the prior art pipeline processing apparatus of Fig. 1. This is mainly because the power consumption by 
the clock signals is reduced by 42 percent as compared with the prior art pipeline processing apparatus of 
Fig. 1. Note that the reduction in power consumption by deleting the sectors SELi, SEU>, 000 and the 
increase in power consumption by delaying the stall signal STL cancel each other. 

In Fig. 6, which illustrates a second embodiment of the present invention using dynamic type flip-flops 
75 as illustrated in Fig. 3B, an inverter Gn, an AND circuit G12, selectors SELi, SEU,--" are added to the 
elements of Fig. 5, to thereby carry out a refresh operation by a refresh signal REF. The refresh signal REF 
is a clock pulse signal which is made high (= "1 ") for every definite time period such as several us. 

The operation of the pipeline processing apparatus of Fig. 6 will now be explained with reference to 
Figs. 7A throuth 7K. When the refresh signal REF is "0" , the operation of the pipeline processing 
20 apparatus of Fig. 6 is the same as that of the pipeline processing apparatus of Fig. 4 (see non-refresh mode 
of Figs. 7A through 7K). That is, the output signal of the inverter Gn is "1", so that the AND circuit G12 
passes the stall signal STL therethrough. In this case, STL1 = STL. Simultaneously, the selectors SELi \ 
SEU'. • • • select the data DA1 , DA2. 0 0 °. respectively. 

When the stall signal STL1 (= STL) is "1" and accordingly, the stall signals STL2 and SLTL3 are "1", 
25 the refresh signal REF is made "1 as shown in Fig. 7D, so that the control enters a refresh mode. That is, 
the selectors SELi \ SEUV • • select the outputs DB1 . DB2, 0 0 0 (= DB). respectively, of the flip-flops FFn . 
FFi2, 0 0 °. Also, since the refresh signal REF (= "1") is inverted by the inverter Gn and is supplied to the 
AND circuit G12, the output signal of the AND circuit G12 is "0" regardless of the stall signal STL. That is, 
the stall signal STL1 is "0" as shown in Fig. 7E. As a result, the clock signai CLK passes through the OR 
30 circuit G1 only for a time period defined by the stall signal STL1 ( = "0") . so that the clock signal CLK1 is 
formed by the clock signal CLK, as shown in Fig. 7F. Therefore, the outputs DB1, DB2.*<»« (= DB) of the 
flip-flops FF1 1 , FF1 2 ■ 0 0 0 are again written thereinto, thus refreshing the flip-flops FFi 1 , FF12, 0 0 •- 

Also, the stall signal STL1 is latched by the flip-flop FF10, and as a result, the stall signal STL2 is "0 n 
for one clock time period AT, as shown in Fig. 7H. Therefore, the clock signal CLK passes through the OR 
35 circuit G 2 only for a time period defined by the stall signal STL2 (= "0"), so that the clock signal CLK2 is 
formed by the clock signal CLK, as shown in Fig. 71. Thus, the outputs DC1, DC2, 000 (= DC) of the flip- 
flops FF21 , FF22, 0 0 • are again written thereinto, thus refreshing the flip-flops FF21 , FF22, 0 0 0 - 

Further.the stall signal STL2 is latched by the flip-flop FF 20 , and as a result, the stall signal STL3 is "0" 
for one clock time period AT, as shown in Fig. 7K. Therefore, the clock signal CLK passes through the OR 
40 circuit G 3 only for a time period defined by the stall signal STL3 ( = "0"). so that the clock signal CLK3 is 
formed by the clock signal CLK, as shown in Fig. 7L. Thus, the outputs DDi. DD2, *• • (= DD) of the flip- 
flops FF31 , FF 32 ,* • * are again written thereinto, thus refreshing the flip-flops FF31 , FF32, 0 0 0 . 

Note that the provision of the selectors SELi, SEU, 00 - at the prestages of the flip-flops FFn, FF12, 
000 makes it definitely possible to carry out a refresh operation upon the flip-flops FF1 1 , FF1 2 , 000 even 
45 when the data DA (DAi. DA2.000) are changed during a refresh mode. Contrary to this, during a stalling 
period, the input values of hte flip-flops FF 2 i. FF22,°°°, FF31, FF 32 are not changed, and therefore, no 
selectors are provided at the prestages of the flip-flops FF21 , FF22, 0 0 0 . FF31 , FF32, 0 0 0 . 

Thus, according to the second embodiment, during a stall period where the change of the flip-flops 
FF11. FFi2," 0 , FF21, FF22, oo °. FF31, FF32. ••• is unnecessary, the generation of the clock signals 
50 CLK1, CLK2, CLK3, • • • is stopped, thus reducing the power consumption. 

The actual power consumption of the pipeline processing apparatus of Fig. 6 will be explained as 
compared with that of the pipeline processing apparatus of Fig. 1. The power consumption of the pipeline 
processing apparatus of Fig. 6 is calculated below. Here, the following conditions are assumed: 
an input capacity of the clock signal CLK to each flip-flop = 0.03 pF; 
55 an internal load capacity of each flip-flop = 0.04 pF; 

an input capacity of the refresh signal REF to each of the selectors SELi \ SELs'. 000= 0.05 pF; 
an internal load capacity of each of the selectors SELi \ SEL^', 000= o.05 pF; 
an input capacity of each of the OR circuits Gi , G2, • • • - 0.10 pF; 



6 




EP 0 638 858 A1 



an internal load capacity of each of the OR circuits G, , Gfe, • • • = 0.50 pF; 
the number of the flip-flops FFi 1 , FFi 2 , • • • = 40; 
the number of the flip-flops FF21 , FF21 , • • • =20; 
the number of the flip-flops FF31 , FF 32 , • • • = 30; 
5 an internal load capacity of the logic gate combination circuit Ci = 20 pF; 

an internal load capacity of the logic gate combination circuit O2 = 10 pF* 
V DD = 5V; 

the frequency f c of the clock signal CLK = 50 MH 2 ; 

the probability of "1" within the stall signal STL = 2/5, i.e., the frequency of the stall siqnal STL = 2/5 
70 • 50MH Z ; and 

the frequency of other logic signals = V4. 
Therefore, from the equation (1 ), 

P = (2 • 0. 03 + 3 • 0.10) • 10~ 12 x 5 2 x 50 • 10 6 
75 + (3 • 0.50 +(40 + 20 + 30) - 0.03) • 10~ 12 x 5 2 x (2/5+ 1/200) • 50 • 10 e 

+ (40 + 20 + 30) • 0.04 • 10~ 12 x 5 2 x 1/4 • 2/5 • 50 - 10* 

+ (2 • 0.04 + 3 • 0.10) • 10- 12 x 5 2 x (2/5 + 1/200) • 50 • 10* 

+ (20+10) • 10" 12 x 5 2 x 1/4 • 2/5 • 50 • 10 6 

+ 40 • 0.04 • 10~ 12 x 5 2 x 1/200 • 50 • 10 6 
20 +40 • 0.05 • 10~ 12 x & x 1/4 • (2/5 + 1/200) • 50 • 10 5 (7) 

where the first term is a power consumption dissipated by the clock signal CLK; the second term is a 
power consumption dissipated in the OR circuits Gi, G 2 and G 3 and by the clock signals CLK1. CLK2 and 
CLK3; the third term is a power consumption dissipated within the flip-flops; the fourth term is a power 
25 consumption dissipated by the stall signal STL; and the fifth term is a power consumption dissipated in the 
logic gate combination circuits Ci and C*; the sixth term is a power consumption dissipated in the selectors 
SELi \ SEtV, • • • . The equation (7) is represented by 

30 p = (0.45 + 1. 44 + 0. 45 + 0. 19 + 3. 75 + 0. 01 + 0.25) -10-3 

= 6.54mW 

Thus, the entire power consumption (6.54 nW) in the second embodiment, validating all the rise edges 
of the clock signals CLK1, CLK2 and CLM3 can be reduced by 24 percent as compared with that (8.63 
55 mW) of the prior art pipeline processing apparatus of Fig. 1. This is mainly because the power consumption 
by the clock signals is reduced by 42 percent as compared with the prior art pipeline processing apparatus 
of Fig. 1. 

In the above-mentioned second embodiment as illustrated in Rg.6, the logic gate combination circuits 
Ci and O2 can be of a dynamic type. In this case, the clock signals CLK2 and CLK3 are supplied to the 
40 logic gate combination circuits Ci and C^, respectively, as indicated by dotted arrows in Fig. 6. That is. 
when the clock signal CLK2 (CLK3) is high, the logic gate combination circuit Ci (C*) is precharged, while 
when the clock signal CLK2 (CLK3) is low, the logic gate combination Ci (O2) carries out a logic operation 
Thus, since the clock signal CLK2 (CLK3) masked by the stall signals STL2 (STL3) is supplied to the logic 
gate combination circuit Ci (C^), the power consumption dissipated by the clock signals can be reduced. In 
45 the prior art pipeline processing apparatus of Fig. 1, the clock signal CLK which has more transitions than 
the clock signals CLK2 and CLK3 may be supplied directly to the logic gate combination circuits Ci and O2 
which are of a dynamic type, thus increasing the power consumption. 

In Fig. 8, which illustrates a third embodiment of the present invention, flip-flops FFi 1 FFi 2 \. . . and a 
logic gate combination circuit cr are connected in parallel to the flip-flops FF n ', FFi 2 \... and the logic 
50 gate combination Ci of Fig. 4. In other words, one first stage consists double-stages designated by sub 
stages ST1 and STt \ The sub stages ST, and ST1 ' are switched by OR circuits G1 and Gi ' and selectors 
SEL1 " and SEL2" which are controlled by a decoding signal DEC. 

In order to operate either the logic gate combination circuit Ci or the logic gate combination circuit Cr, 
either the clock signal CLK1 or the clock signal CLKV is generated. For example, when the decoding signal 
55 DEC is W 1 w , the clock signal CLK1 is clocked, while when the decoding signal DEC is n 0 rt , the clock signal 
CLK1 ' is clocked. 

In the third embodiment as illustrated in Fig. 8. if the logic gate combination circuit Ci forms an 
arithmetic and logic unit (ALU), the logic gate combination circuit Ci ' forms a barrel shifter, and the logic 
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gate combination circuit C2 forms a data cache memory, the pipeline processing apparatus of Fig. 8 can 
serve as a microprocessor. 

Thus, in the third embodiment as illustrated in Fig. 8. even when there is a logic gate combination 
circuit which is hardly operated, a surplus power consunption therefor can be reduced. 
'5 In Fig. 8, a stage other than the first stage can be double-staged. Also, a multi-stage greater than a 
double-stage can be adopted instead of the double-stage. 

As explained hereinbefore, according to the present invention, in a pipeline processing apparatus, since 
power consumption dissipated by a clock signal can be reduced, an everall power consumption dissipated 
in the pipeline processing apparatus can be reduced. 

10 

Claims 

1. A pipeline processing apparatus comprising: 

a plurality of serially-connected stages (ST1 , ST 2 , • • •) ; and 
76 a clock signal generating means, connected to said stages, for generating a plurality of clock 

signals (CLK1, CLK2,»--) and transmitting them to said stages individually, each of the clock signals 
being for operating one of said stages. 

2. An apparatus as set forth in claim 1 , wherein said clock signal generating means receives a stall signal 
20 (STL) for stopping an operation of said apparatus to stop generation of said clock signals individually. 

3. An apparatus as set forth in claim 1 . wherein said clock generating means comprises: 

a plurality of stall signal generating means for generating a plurality of stall signals (STL1, STL2. 
• • • ); and 

25 a plurality of gate circuits (Gi. G2.-»*).each connected to one of said stall signal generating 

means, each of said gate circuits receiving a common clock signal (CLK) and one of said stall signals 
and passing the common clock signal therethrough in accordance with one of said stall signals. 

4. An apparatus as set forth in claim 3, wherein said plurality of staii signai generating means comprise a 
30 plurality of serially-connected delay circuits for receiving a main stall signal (STL) and delaying it to 

generate the plurality of stall signals. 

5. An apparatus as set forth in claim 3, wherein said plurality of stall signal generating means comprises a 
plurality of serially-connected flip-flops (FF10, FF 20 . •••) clocked by the common clock signal to 

35 generate the plurality of stall signals having a delay time period therebetween determined by the main 
clock signal. 

6. An apparatus as set forth in claim 1 , wherein said clock signal generating means comprises: 

means for generating a plurality of stall signals (STL1, SVTL2, • • • ) having delay time periods in 
40 response to operations of said stages; and 

means for carrying out logic operations between the stall signals and a common clock signal (CLK) 
to generate the clock signals in accordance with results of the logic operations. 

7. An apparatus as set forth in claim 2, wherein said stages are of a dynamic type, 

45 said apparatus further comprising means for receiving a refresh signal (REF) to stop generation of 

the stall signal. 

8. An apparatus as set forth in claim 3, wherein said stages are of a dynamic type, 

said apparatus further comprising means for receiving a refresh signal (REF) to stop generation of 
50 the stall signals. 

9. An apparatus as set forth in claim 1. wherein each of said stages comprises: 

a plurality of flip-flops (FFn, FFi 2 ,**«, FF21, FF 22 ,"-, FF31, FF 32 ,»-0» each clocked by one of 
the clock signals; and 

55 a logic gate combination circuit (Ci , C*,* • •) connected to outputs of said flip-flops. 

10. An apparatus as set forth in claim 9, wherein said logic gate combination circuit is of a dynamic type 
clocked by one of the clock signals. 
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11. An apparatus as set forth in claim 1, wherein at least one of said stages includes a plurality of 
parallelly-connected sub stages (STi , STi '), 

said apparatus further comprising a decoding means (G 1p GO for receiving a decoding signal 
(DEC) to select one of said sub stages. 

12. A pipeline processing apparatus comprising: 

a plurality of serially-connected stages (STi, ST 2 , •••). each including a plurality of first flip-flops 
(FFii, FF^2,^^^, FF 2 i. FF 2 2.***, FF31, FF 3 2,-»*), each clocked by one of the clock signals; and a 
logic gate combination circuit (Ci , Cs>,* • •) connected to outputs of said flip-flops; 

a plurality of serially-connected second flip-flops (FFn, FF 20 ,*--) for delaying a main stall signal 
(STL1) to generate a plurality of stall signals (STL2, STL3, • • •) having a delay time (AT) therebetween; 

a gate means (G1 ) for passing a common clock signal (CLK) therethrough in accordance with the 
main clock signal and transmitting a passed common clock signal to said first flip-flops of a first one of 
said stages; and 

a plurality of gate means (G 2 , G 3 ,-*»).©ach connected to one of said second flip-flops, each for 
passing the common clock signal therethrough in accordance with one of the stall signals and 
transmitting a passed common clock signal to said first flip-flops of one of said stages after the first 
one. 

1a An apparatus as set forth in claim 12, wherein said first flip-flops of said stages are of a dynamic type, 
said apparatus further comprising means for receiving a refresh signal (REF) to stop generation of 
the main stall signal. 

14. An apparatus as set forth in claim 13. further comprising selector means (SEU\ SEL2V ••) for feeding 
back output signals of said first flip-flops of the first stage in response to the refresh signal. 

15. An apparatus as set forth in claim 13. wherein said logic gate combination circuit is of a dynamic type. 

said logic gate combination circuit being connected to one of said gate means to receive the 
passed main clock signal, so that said logic gate combination circuit carries out a precharging operation 
and a logic operation alternatively. 

16. An apparatus as set forth in claim 12. wherein at least one of said stages includes a plurality of 
parallelly-connected sub stages (STi , STi '), 

said apparatus further comprising a decoding means (G1, GO for receiving a decoding signal 
(DEC) to select one of said sub stages. 
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